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Abstract 

Crowdsourcing systems, in which numerous tasks are electronically distributed to numerous 

"information piece-workers" , have emerged as an effective paradigm for human-powered solving 
of large scale problems in domains such as image classification, data entry, optical character 
recognition, recommendation, and proofreading. Because these low-paid workers can be un- 
reliable, nearly all crowdsourcers must devise schemes to increase confidence in their answers, 
typically by assigning each task multiple times and combining the answers in some way such as 
majority voting. 

In this paper, we consider a general model of such crowdsourcing tasks and pose the problem 
of minimizing the total price (i.e., number of task assignments) that must be paid to achieve 
a target overall reliability. We give a new algorithm for deciding which tasks to assign to 
which workers and for inferring correct answers from the workers' answers. We show that 
our algorithm, inspired by belief propagation and low-rank matrix approximation, significantly 
outperforms majority voting and, in fact, is asjonptotically optimal through comparison to an 
oracle that knows the reliability of every worker. We consider both a one-shot scenario in which 
all questions are asked and answered simultaneously and an iterative scenario in which one may 
choose to gather additional responses to certain questions adaptively based on the responses 
collected thus far. Perhaps surprisingly, we show that the minimum price that must be paid 
in order to achieve a certain reliability under both scenarios scale in the same manner, which 
implies that there is no significant gain in asking questions adaptively. 



1 Introduction 



Background. Crowdsourcing systems have emerged as an effective paradigm for human-powered 
problem solving and are now in widespread use for large-scale data-processing tasks such as image 
classification, video annotation, form data entry, optical character recognition, translation, recom- 
mendation, and proofreading. Crowdsourcing systems such as Amazon Mechanical Turl^ market 
where a "taskmaster" can submit batches of small tasks to be completed for a small fee by any 
worker choosing to pick them up. For example a worker may be able to earn a few cents by indi- 
cating which images from a set of 30 are suitable for children (one of the benefits of crowdsourcing 
is its applicability to such highly subjective questions). 

Because the tasks are tedious and the pay is low, errors are common even among workers 
who make an effort. At the extreme, some workers are "spammers", submitting arbitrary answers 
independent of the question in order to collect their fee. Thus, all crowdsourcers need strategies 
to ensure the reliability of answers. Because the worker crowd is large, anonymous, and transient, 
it is generally difficult to build up a trust relationship with particular workers It is also difficult 
to condition payment on correct answers, as the correct answer may never truly be known and 
delaying payment can annoy workers and make it harder to recruit them to your task. Instead, 
most crowdsourcers resort to redundancy, giving each task to multiple workers, paying them all 
irrespective of their answers, and aggregating the results by some method such as majority voting. 

For such systems there is a natural core optimization problem to be solved. Assuming the 
taskmaster wishes to achieve a certain reliability in their answers, how can they do so at minimum 
cost (which is equivalent to asking how they can do so while asking the fewest possible questions)? 

Several characteristics of crowdsourcing systems make this problem interesting. Workers are 
neither persistent nor identifiable; each batch of tasks will be solved by a worker who may be 
completely new and who you may never see again. Thus one cannot identify and reuse particularly 
reliable workers. Nonetheless, by comparing one worker's answer to others' on the same question, 
it is possible to draw conclusions about a worker's reliability, which can be used to weight their 
answers to other questions in their batch. However, batches must be of manageable size, obeying 
limits on the number of tasks that can be given to a single worker. 

Another interesting aspect of this problem is the choice of task assignments. Unlike many 
inference tasks which makes inferences based on a fixed set of signals, our algorithm can choose 
which signals to measure by deciding which questions to include in which batches. In addition, there 
are several plausible options: for example, we might choose to ask few "pilot questions" to each 
worker (just like a qualifying exam) to decide on the reliability of worker. However, this only biases 
the prior distribution of the workers' reliability. Another possibility is to first ask few questions 
and based on the answers decide to ask more questions or not. We would like to understand the 
role of all such variations in the overall optimization of budget for reliable task processing. 

In the remainder of this introduction, we will define a formal model that captures these aspects 
of the problem. We consider both a one-shot scenario in which all questions are asked and answered 
simultaneously and then the correct answers are inferred based on all the responses, and an iterative 
scenario in which one may choose to gather additional responses to certain questions adaptively 
based on the responses collected thus far. We, somewhat surprisingly, show that the optimal costs 

^http:/ /www. mturk.com 

■^For certain high-value tasks, crowdsourcers can use entrance exams to "prequalify" workers and block spammers, 
but this increases the cost of the task and still provides no guarantee that the workers will try hard after qualification. 
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under both scenarios scale in the same manner. That is, asking questions iteratively does not help! 
We design a non-adaptive algorithm which operates under the one-shot scenario, and show that 
this approach is optimal for both scenarios. Hence, there is no gain in switching to an adaptive 
algorithm. We provide an optimal task allocation scheme and an optimal algorithm for inference on 
that task allocation. We will then show that our algorithm is asymptotically order-optimal under 
both scenarios: for a given target error rate, it spends only a constant factor times the minimum 
necessary to achieve that error rate. The optimality is established through comparisons to an oracle 
that knows the reliability of every worker and can thus make optimal decisions. In particular, we 
derive a parameter q that characterizes the 'collective' reliability of the crowd, and show that to 
achieve target reliability e it is both necessary and sufficient to replicate each task Q{l/q\og{l/e)) 
times. 

Setup. We model a set of m tasks {ii}ie[m] each being associated with unobserved 'correct' 
solutions Si G {il}- Here and after, we use [N] to denote the set of first integers. In the 
image categorization example stated earlier, tasks corresponds to labeling m images as suitable 
for children (+1) or not (—1). These tasks are assigned to n anonymous workers from the crowd, 
which we denote by {ifj}jg[n]- 

When a task is assigned to a worker, we get a possibly inaccurate response. We use Aij G {0, ±1} 
to denote this response on task ti from worker wj. To simplify notations, we use to denote that 
the particular task is not assign to that worker. Some workers are more diligent or have more 
expertise than others, while some other workers might be spammers. We choose a simple model to 
capture this diversity in workers' reliability: we assume that each worker wj is characterized by a 
reliability pj € [0, 1], and that they make errors randomly on each question they answer. Precisely, 
if task ti is assigned to worker Wj then 



^ _ ( Si with probability pj , 

^ —Si with probability 1 — pj 



and Aij = if is not assigned to Wj. The random variable Ajj is independent of any other 
event given pj. (Throughout this paper, we use boldface characters to denote random variables 
and random matrices unless it is clear from the context.) The underlying assumption here is that 
the error probability of a worker does not depend on the particular task and all the tasks share an 
equal level of difficulty. Hence, each worker's performance is consistent across different tasks. We 



discuss a possible generalization of this model in Section 2.5 

To distribute tasks to anonymous workers from the crowd, the taskmaster creates batches of 
tasks, which are distributed through a crowdsourcing platform through an open-call to an uniden- 
tified pool of workers. Each batch, consisting of a small set of tasks, is picked up by any worker 
choosing to complete it. Although we have no control over who gets to work on which batch, we 
have complete control over which tasks are included in which batch, which we refer to as the choice 
of task assignment. Assigning tasks to the batches amounts to choosing a bipartite graph G with m 
task nodes and n batch nodes (or worker nodes), where an edge indicates that the task is assigned 
to that particular batch. Equivalently, we can think of the graph as connecting tasks to workers, 
since each batch is to be completed by a single worker (who is not identified at the time of graph 
generation). Since we do not have any control over who picks up the batches, we assume that each 
batch is completed by a worker who is drawn from an i.i.d. distribution and that we do not have 
any a priori information on how reliable that worker is. 
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Formally, we assume that the reliability of workers {pj}jg[„] are independent and identically 
distributed random variables with a given distribution on [0, 1]. One example is spammer-hammer 
model where, each worker is either a 'hammer' with probability g or is a 'spammer' with probability 
1 — q. A hammer answers all questions correctly, in which case pj = 1, and a spammer gives random 
answers, in which case pj = 1/2. Another example is the beta distribution with some parameters 
a > and /3 > {f{p) = p'^~^{l — p)^~^ / B{a, 13) for a proper normalization B{a,f3)). Given this 
random variable pj, we define an important parameter q £ [0,1], which captures the 'collective 
quality' of the crowd: 

q = E[(2pj-1)2]. 

A value of q close to one indicates that a large proportion of the workers are diligent, whereas q close 
to zero indicates that there are many spammers in the crowd. The definition of q is consistent with 
use of q in the spammer-hammer model and in the case of beta distribution, q = 1 — (4a/3/((Q! + 
/3)(q + /3 + 1))). We will see later that our bound on the error rate of our inference algorithm holds 
for any distribution of Pj but depends on the distribution only through this parameter q. 

It is quite realistic to assume the existence of a prior distribution for pj . The model is therefore 
quite general: in particular, it is met if we simply randomize the order in which we upload our 
task batches, since this will have the effect of randomizing which workers perform which batches, 
yielding a distribution that meets our requirements. On the other hand, it is not realistic to assume 
that we know what the prior is. To execute our inference algorithm for a given number of iterations, 
we do not require any knowledge of the distribution of the reliability. However, q is necessary in 
order to determine how many times a task should be replicated and how many iterations we need 
to run to achieve certain reliability. We discuss a simple way to overcome this limitation in Section 
[231 

We first consider a one-shot scenario, in which all questions are asked simultaneously and then 
an estimation is performed after all the answers are obtained. In particular, we do not allow 
allocating tasks adaptively based on the answers received thus far (e.g. which workers are more 
reliable or which tasks we have less confidence in). In practice, latency is a crucial factor, and to be 
able to deliver a solution in time, all the tasks are typically assigned at once so that all the batches 
are processed in parallel. 

However, it is also of great practical interest to ask how much we can gain by using an adaptive 
task allocation scheme. Hence, we next compare our approach to a more general class of schemes 
that operate under an iterative scenario, in which one can adaptively assign tasks. In this case, one 
might be tempted to first identify which workers are more reliable and then assign all the tasks to 
those workers in an explore/exploit manner. However, in crowdsourcing, it is unrealistic to assume 
that we can identify and reuse any particular worker, including the reliable ones, since typical 
workers are neither persistent nor identifiable and batches are distributed through an open-call. 
Hence, under an iterative scenario in crowdsourcing, although we cannot reuse any of the workers, 
we can adaptively assign more workers to those tasks that we have less confidence in, by assigning 
those tasks to the new batches that we create adaptively. These new batches are then distributed 
to unidentified workers through an open-call using a crowdsourcing platform. 

Prior Work. Previously, crowdsourcing system designs have focused on developing inference 
algorithms assuming that the graph is fixed and the workers' responses are already given. None of 
the prior work on crowdsourcing provides any systematic treatment of the task allocation. To the 
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best of our knowledge, we are the first to study both aspects of crowdsourcing together and, more 
importantly, establish optimality. 

A naive approach to solve the inference problem, which is widely used in practice, is majority 
voting. Majority voting simply follows what the majority of workers agree on. When we have many 
spammers in the crowd, majority voting is error-prone since it gives the same weight to all the 
responses, regardless of whether they are from a spammer or a diligent workers. We will show in 



Section 2.3 that majority voting is provably sub-optimal and can be significantly improved upon. 

If we know how reliable each worker is, then it is straightforward to find the optimal estimates: 
compute the weighted sum of the responses weighted by the log-likelihood. Although, in reality, 
we do not have this information, it is possible to learn about a worker's reliability by comparing 
one worker's answer to those of others'. This idea was first proposed by Dawid and Skene, who 
introduced an iterative algorithm based on expectation maximization (EM) |DS79| . This heuristic 
algorithm iterates the following two steps. In the M-step, the algorithm estimates the error prob- 
abilities of the workers that maximizes the likelihood using the current estimates of the answers. 
In the E-step, the algorithm estimates the likelihood of the answers using the current estimates of 
the error probabilities. 

More recently, a number of algorithms followed this EM approach based on a variety of prob- 
abihstic models |SFB+95l IWRW+n9l IRYZ+IO] . The EM algorithm has also been widely applied 
in classification problems, where a set of labels from low-cost noisy labelers is used to find a good 
classifier |JG03l lRYZ"'"ld] . Given a fixed budget, there is a trade-off between acquiring a larger 
training dataset or acquiring a smaller dataset but with more labels per data point. Through ex- 
tensive experiments, Sheng, Provost and Ipeirotis [SPIOSj show that getting repeated labeling can 
give considerable advantage. 

Despite the popularity of the EM algorithms, the performance of these approaches are only 
empirically evaluated and there is no analysis that gives performance guarantees. In particular, 
EM algorithms are highly sensitive to the initialization used, making it difficult to predict the 
quality of the resulting estimate. Further, the role of the graph structure G is not at all understood 
with the EM algorithm (or for that matter any other algorithm). 

Contributions. In this work, we provide the first rigorous treatment of both aspects of designing 
a reliable and cost-efficient crowdsourcing: task allocation and inference. Our approach aims to 
minimize the budget to achieve completion of a set of tasks to meet a certain target reliability. 
We provide both an asymptotically optimal graph construction (based on random graphs) and 
an asymptotically optimal algorithm for inference (based on low-rank approximation and belief 
propagation) on that graph. As the main result, we show that our algorithm performs as good 
as an oracle estimator. The surprise lies in the fact that we establish this result by comparing 
our algorithm with an oracle estimator which is free to choose any graph and performs optimal 
estimation on that graph based on the information provided by an oracle about reliability of every 
worker. 

We further show that the budget necessary to achieve a certain reliability in the answers using 
an oracle estimator scales in the same manner for both one-shot and iterative scenarios. Our 
non-adaptive approach still performs as good as the best possible algorithm, even if we allow this 
best algorithm to assign tasks adaptively. Hence, there is no significant gain in using an adaptive 
strategy. 

Another novel contribution of our work is the analysis technique. The iterative inference al- 
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gorithm we introduce operates on real-valued messages whose distribution is a priori difficult to 
analyze using the standard technique. To overcome this challenge, we develop a novel technique 
of establishing that these messages are sub-Gaussian and compute the parameters recursively in a 
closed form. This allows us to prove the sharp result on the error rate. This technique could be of 
independent interest in analyzing a more general class of message-passing algorithms. 

2 Main Result 

To achieve a certain reliability in our answers with minimum cost, we want to design a reliable and 
cost-efficient crowdsourcing system. In what follows, we propose using random regular graphs for 
task assignment and introduce a novel iterative algorithm to infer the correct answers. We prove 
a bound on the resulting error and show that our approach is asymptotically order-optimal under 
both one-shot scenario and iterative scenario: it requires only a constant factor times the minimum 
necessary budget to achieve a target error rate. 

2.1 Algorithm 

Task allocation. Assigning tasks amounts to designing a bipartite graph G[{ti}i^[jy^-^U{wj}j^[n] > , 
where each edge corresponds to a task-worker assignment. We propose using a regular bipartite 
graph chosen uniformly at random with bounded degrees. Sparse random graphs have several good 
properties. In the large system limit, sparse random graphs converge locally to a tree, which allows 
us to do sharp analysis of our algorithm. But more importantly, random graphs are excellent ex- 
panders with large spectral gaps, which enables us to reliably separate the low-rank structure from 
the data which is perturbed by random noise. 

As a taskmaster, one first makes a choice of how many workers to assign to each task (the 
left degree I) and how many tasks to assign to each worker (the right degree r). Since the total 
number of edges has to be consistent, the number of workers n directly follows from ml = nr. To 
generate an (^, r)-regular bipartite graph we use a random graph generation scheme known as the 
configuration model in random graph literature |RU081 IBolOlj . 

Inference algorithm. To solve the inference problem, we introduce a novel iterative algorithm 
which is inspired by the celebrated belief propagation algorithm and low-rank matrix approxima- 
tion. This algorithm operates on real-valued task messages {xi^j}(^i j^^E and worker messages 
It starts with the worker messages initialized as independent Gaussian random vari- 
ables. The algorithm is not sensitive to a specific initialization as long as it has a strictly positive 
mean. At each iteration k G {1, . . .}, the messages are updated according to 

xfllj = ^ Aij'VfZ? ' fo'^ ah (i,j) G E, and 

j'&di\j 

y\% = Yl ^^'A'lj ' for all (i, j) G E 

i'&dj\i 

where di is the neighborhood of and dj is the neighborhood of wj. Intuitively, a worker message 
Vj^i represents our belief on how 'reliable' the worker j is, such that our final estimate is a weighted 
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sum of the answers weighted by each worker's rehabihty: 

Si = sign(^ ^ ^ijVj- 

It is understood that when there is a tie we flip a fair coin to make a decision. 

Iterative Algorithm 

Input: E, {^ij}{i,j)e£;! ^max 
Output: Estimate s({yljj}) 
1: For all G ^ do 

Initialize yfXi with random Zij ~ AA(1, 1) ; 

2: For A; = 1, . . . , fcmax do 

For all (i, i) e ii; do xf}^ ^ E.'eOA. ^^Z?//-;? 5 
For all eE do yfXi ^ Ei'eaiV ^i'i41j ! 

3: For all i G [m] do Xi ^ ^-^g^ Ai^yf^^^-^^ ; 

4: Output estimate vector s({Ajj}) = [sign(xi)] . 

We emphasize here that our inference algorithm requires no information about the prior distri- 
bution of the workers' quality pj. Our algorithm is inspired by power iteration used to compute 



the leading singular vectors of a matrix, and we discuss the connection in detail in Section 2.6 

While our algorithm is also inspired by the standard Belief Propagation (BP) algorithm for ap- 
proximating max- marginals [ Pea88t IYFW03] , our algorithm is original and overcomes a few critical 
limitations of the standard BP. First, the iterative algorithm does not require any knowledge of 
the prior distribution of pj, whereas the standard BP requires the knowledge of the distribution. 
Second, there is no efficient way to implement standard BP, since we need to pass sufficient statis- 
tics (or messages) which under our general model are distributions over the reals. On the other 
hand, the iterative algorithm only passes messages that are real numbers regardless of the prior 
distribution of pj, which makes it easy to implement. Third, the iterative algorithm is provably 
asymptotically order-optimal. Density evolution, is a standard technique to analyze the perfor- 
mance of BP. Although we can write down the density evolution for the standard BP, we cannot 
describe or compute the densities, analytically or numerically. It is also very simple to write down 
the density evolution equations (cf. ([s])) for our algorithm, but it is not a priori clear how one can 
analyze the densities in this case either. We develop a novel technique to analyze the densities for 
our iterative algorithm and prove optimality. This technique could be of independent interest to 
analyzing a broader class of message-passing algorithms. 



2.2 Performance guarantee 

We state the main analytical result of this paper: for random (Z, r)-regular bipartite graph based 
task assignments with our iterative inference algorithm, the probability of error decays exponentially 
in Iq, up to a universal constant and for a broad range of the parameters /, r and q. With a 
reasonable choice of / = r and both scaling like (l/g) log(l/e), the proposed algorithm is guaranteed 
to achieve error less than e for any e G (0,1/2). Further, an algorithm independent lower bound 
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that we establish suggests that such an error dependence on Iq is unavoidable. Hence, in terms of 
the total budget, our algorithm is order-optimal. The precise statements follow next. 

Define a parameter // = E[2pj — 1] and recall that q = E[(2pj — 1)^]. To lighten the notation, 
let I = I — 1 and r = r — 1. Define 

2 _ 2g / 1 \ 1 - (l/g2/>)fc-i 



Pk = ^ + 3 + , , 

^M'^iqHrf-^ V qrJ l-{l/qHf) 

For q If > 1, let = limfc^oo p| such that 

2 f ^ \ 

Then we can show the following bound on the probability of making an error. 

Theorem 2.1. For fixed I > 1 and r > 1, assume that m tasks are assigned to n = ml/r workers 
according to a random {l,r)-regular graph drawn from the configuration model. If the distribution 
of the worker reliability satisfy /x = E[2pj — 1] > and q^ > l/(/f), then for any s G {±1}™, the 
estimates from k iterations of the iterative algorithm achieve 

^ m 

hm -^p(.,/5,({A,,}(,^.)eij)) < e-'«/(2pi). (1) 
As we increase /c, the above bound converges to a non-trivial limit. 



Corollary 2.2. Under the hypotheses of Theorem 2.1 



^ rn 

hm hm -Vp(.,/5,({Ay}(„.)e^)) < e'^^/^^p^) . (2) 



m 

1=1 



Even if we fix the value oi q = E[(2pj — 1)^], different distributions of pj can have different 
values of fx in the range of ['J'jy^]. Surprisingly, the asymptotic bound on the error rate does not 
depend on /x. Instead, as long as q is fixed, only affects how fast the algorithm converges (cf. 
Lemma 2.3 ). 

Next, we make a few remarks on the performance guarantee. 

First, the iterative algorithm is efficient with run-time comparable to that of majority voting 
which requires 0{ml) operations. Each iteration of the iterative algorithm requires 0{ml) oper- 
ations, and we need 0{[og{q/ ji^) / \og{q If)) iterations to ensure an error bound which scales as 
0. 



Lemma 2.3. Under the hyp othe ses of Theorem 2.1, the total computational cost sufficient to 
achieve the bound in Corollary 2.2 up to any constant factor in the exponent is 0{ml \og{q/ /i^) / log(g^/r)). 

By definition, we have q < n < yjq. The runtime is the worst when ^jl = q, which happens 
under the spammer-hammer model, and it is the best when ^ = ^Jq which happens if pj = 
(l + y^)/2 deterministically. There exists a (non-iterative) polynomial time algorithm with runtime 
independent of q for computing the estimate which achieves ([2]), but in practice we expect that 
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Number of assignments per task (1) 



Figure 1: The iterative algorithm improves over majority voting and EM algorithm |SPI08j . 

the number of iterations needed is small enough that the iterative algorithm will outperform this 
non-iterative algorithm. 

Second, the assumption that /i > is necessary. If there is no assumption on //, then we cannot 
distinguish if the responses came from tasks with and workers with {Pj}je[n] or tasks with 

{~^i}i&[m] s-^d workers with {1 — pj}jg[„]. Statistically, both of them give the same output. The 
hypothesis on /i allows us to distinguish which of the two is the correct solution. In the case when 
we know that /x < 0, we can use the same algorithm changing the sign of the final output and get 
the same performance guarantee. 

Third, our algorithm does not require any information on the distribution of pj. Further, 
unlike previous approaches based on Expectation Maximization (EM), the iterative algorithm is 
not sensitive to initialization and converges to a unique solution from a random initialization with 
high probability. This follows from the fact that the algorithm is essentially computing a leading 
eigenvector of a particular linear operator. 

Finally, we observe a phase transition at Ifq"^ = 1. Above this phase transition, when Ifq^ > 1, 
we will show that our algorithm is order-optimal and the probability of error is significantly smaller 
than majority voting. However, perhaps surprisingly, when we are below the threshold, when 
Ifq'^ < 1, we empirically observe that our algorithm exhibit a fundamentally different behavior. 
The error we get after k iterations of our algorithm increases with k. In this regime, we are better 
off stopping the algorithm after 1 iteration, in which case the estimate we get is essentially the 
same as the simple majority voting, and we cannot do better than majority voting. This phase 
transition is universal and we observe similar behavior with other inference algorithms including 
expectation maximization approaches. 

This is illustrated in Figure. [Tj We ran 10 iterations of expectation maximization and our 
iterative algorithm, and compare the performance to majority voting and the oracle estimator. For 
this numerical simulation, we chose / = r and a distribution of the workers such that q = 0.3. 
Hence, we observe the phase transition around 1 = 1 + 1/0.3 = 4.3333. 

We also ran experiments with real crowd using Amazon Mechanical Turk. In our experiment 
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Figure 2: Real experimental results on color comparison using Amazon Mechanical Turk. 

with colors, we created tasks for comparing colors, each task showing three colors, one on the top 
and two on the bottom. We asked the crowd to indicate "if the color on the top is more similar 
to the color on the left or on the right". We created 50 such tasks and recruited 28 workers to 
answer all the questions. The ground truth, in this case, is chosen based on the distances in the 
Lab color space between the two pairs of colors, which is a good measure of the perceived distance 
between a pair of colors jWS67| . Once we have this data, we can subsample the data to simulate 
what would have happened if we collected smaller number of responses per task, while keeping 
the number of tasks and number of workers fixed. The resulting average probability of error is 
illustrated in Figure. [2] For this crowd, we can estimate the collective quality from the data, 
which is about q ~ 0.175. Theoretically, this indicates that phase transition should happen when 
{I — l)((50/28)/ — l)q^ = 1, since we set r = (50/28)/. With this, we expect phase transition to 
happen around Z ~ 5. In Figure. [2| we see that the phase transition happens around / = 8. 

2.3 Optimality under the one-shot scenario 

As a taskmaster, the natural core optimization problem of our concern is how to achieve a certain 
reliability in our answers with minimum cost. The cost is proportional to the total number of 
assignments which is the number of edges of the graph G. We show that our algorithm is asymp- 
totically order-optimal for a broad range of the problem parameters /, r and q. For a given target 
error rate e, the total budget sufficient to achieve this target error rate using our algorithm is within 
a constant factor from what is necessary using any graph and the oracle estimator. In this section, 
we compare our approach to an oracle estimator that operates under the one-shot scenario. The 
optimality of our approach under the iterative scenario is discussed in the next section. 

Formally, consider a scenario where there are m tasks to complete and a target accuracy e G 
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(0, 1/2). To measure accuracy, we use the average probability of error per task denoted by 



^ m 



m 

1=1 



Here the probability is taken over all realizations of the random graph (if using a random graph) , in- 
stances of woke responses, and realizations of worker reliability. We will show that Cl(^{l/q) log(l/ e)) 
assignments per task is necessary and sufficient to achieve the target error rate: dmis, s) < e. 

We first prove the following minimax bound on error rate. Consider the case where nature 
chooses a set of correct answers s £ {±1}'" and a distribution / of the worker reliability pj. The 
distribution / is chosen from a set of all distributions on [0, 1] which satisfy E/[(2pj — 1)^] = q. 
We use J^{q) to denote this set of distributions. Let G{m, I) denote the set of all bipartite graphs, 
including irregular graphs, that have m task nodes and ml total number of edges. 

Lemma 2.4. The minimax error rate achieved by the best possible graph G G Q{rn,l) using the 
best possible inference algorithm is at least 

A, A^L n dm{s,SG,A\go) > \{l - q)\ 

Algo,Ge0(m,O s,feT{q) ^ 

where SG,Aigo denotes the estimate we get using graph G for task allocation and algorithm Algo for 
inference. 

This minimax bound is established by computing the error rate of an oracle estimator that makes 
an optimal decision given the reliability of every worker. When q is equal to one, the inference is 
trivial and we get a trivial lower bound. The inference problem becomes more challenging when 
q < Ci for some numerical constant Ci < 1. In this case, the above lemma implies 

inf sup dm{s,SG,M,o) > ^e-('«+^^'''') , (3) 

for some numerical constant C2. Let Alb the minimum cost per task necessary to achieve a target 
accuracy e using any graph and the best possible algorithm on that graph. Then, in the case of the 
minimax scenario where the nature chooses the worst distribution /, 

A,B = e(llog(l)). (4) 

Next, we show that the error rate of majority voting decays significantly slower. Let SG,Majority 
be the estimate produced by majority voting on graph G. 

Lemma 2.5. In the regime where where q < C2 < 1, there exists a numerical constant C3 such 
that 

inf sup dm{s, SG,Ma}ority) > e~*-^3('?^+l) _ 
Geg{m,l) s,feT{q) 
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Let Aiviajority be the minimum cost per task necessary to achieve a target accuracy e using 
the majority voting scheme on any graph. Then, in the case where the nature chooses the worst 
distribution /, 

AMajority = (^)) • (5) 

The lower bound in ([s]) holds regardless of how many tasks are assigned to each worker. However, 
error rate of our algorithm depends on the value of r. We show that for a broad range of parameters 
/, r, and q, our algorithm achieves optimality. Let siter be the estimate given by rand om regular 
graphs and the iterative algorithm. For Iq > C4, rq > C5 and C4C5 > 1, Corollary 



2.2 



gives 



lim sup dm{s, sitcr) < e '^'''^ . 

Let Alter be the minimum cost per task sufficient to achieve a target accuracy e using our 
proposed algorithm. Then we get 

Alter = e(^ log 

Comparing it to the necessary budget in Q, this establishes the order-optimality of our algorithm. 
Further, from ([5| we see that majority voting is significantly more costly than the optimal scaling of 
log(l/e) of our algorithm. Finally, we emphasize that the low-rank approximation algorithm 
is quite efficient. Simple majority voting requires 0((l/g^) log(l/e)) operations per task to achieve 
target error rate e, and our approach requires 0((l/g) log(m) log(l/e)) operations per task. 



2.4 Optimality under the iterative scenario 

We show in this section that, surprisingly, there is no significant gain in switching from our approach 
to an adaptive strategy, which can assign tasks adaptively based on the responses from the workers. 
The order-optimality of our approach, in terms of budget required to achieve a given target accuracy, 
still holds even if we include a more general class of algorithms that can adaptively choose how to 
assign tasks to new batches. 

Under the one-shot scenario, all task allocations has to be done simultaneously. In particular, 
we cannot create new batches adaptively based on what we learned from the responses thus far 
(e.g. which workers are more reliable or which tasks we have less confidence in). Alternatively, we 
can consider a more general class of algorithms which operate under an iterative scenario, where 
one can adaptively assign tasks. One might be tempted to first identify which workers are more 
reliable and then assign all the tasks to those workers in an explore/exploit manner. However, with 
crowdsourcing, it is unrealistic to assume that we can identify and reuse any particular worker, 
including the reliable ones, since typical workers are neither persistent nor identifiable and batches 
are distributed through an open-call. Hence, under an iterative scenario in crowdsourcing, although 
we cannot reuse any of the workers, we can adaptively assign more workers to those tasks that we 
have less confidence in, by assigning those tasks to the new batches that we create adaptively. These 
new batches are then distributed to unidentified workers through an open-call using a crowdsourcing 
platform. 

For example, in the first round we can choose a graph and run inference as in the one-shot 
scenario. Then, in the second round, we might choose to assign additional workers to those tasks 
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that are 'less reliable' in order to increase reliability. Can we reduce the budget and still ensure 
the same probability of error by exploiting this adaptive process? Surprisingly, we show that in 
terms of the average necessary budget, there is no significant gain in using an adaptive strategy. 
Hence, our approach of using a random graph with our iterative inference algorithm achieves 
order-optimal performance even if we include all the adaptive strategies. This is a stronger result 
on order-optimality of our approach, since our algorithm is a non-adaptive one that operates under 
the one-shot scenario, which is a special case of the iterative scenario. Precisely, we show that in 
order to achieve an average probability of error less than e using any possible strategy that can 
adaptively assign tasks under the iterative scenario with the best possible inference algorithm, it is 
still necessary to have budget that scales like (1/g) log(l/e). 

This result is established by considering an oracle estimator who knows the reliability of all 
the workers and can use any adaptive task assignment scheme. To compute the accuracy of this 
oracle estimator, we need to assume in this case that Sj's are chosen independently and there is a 
non-vanishing probability of Si being either positive or negative. 

Lemma 2.6. Assume that the correct answer to each task is chosen independently, and = 
-|-1) > and F{si = —1) > independent of q, m, and e. Given a target accuracy e S (0, 1/2), let 
AAdaptiveLB(e, Algo, /) bc the budget per task necessary to achieve (1/m) Yl^i^i^i ¥^ Si) < e using 
algorithm Algo for both inference and task assignment, when the worker quality pj is distributed 
according to a distribution f on [0,1]. Then for any algorithm that operates under the iterative 
scenario and for q E (0,0.64], the following holds. 



where J-{q) denotes the set of all distributions on [0, 1] which satisfy E[(2pj — 1)^] = q. 

This proves that there is no significant gain in using an adaptive scheme, and our approach 
achieves the order-optimal performance with a simple non-adaptive scheme. 

We identify that there are two regimes of the worker quality distribution /, where the behavior of 
the adaptive strategies differ significantly. This is captured in a quantity W = Ej[log(pj7(l — Pj))]. 
In particular, to prove the above minimax lower bound, we consider the case when nature chooses 
a worst case worker distribution /, and it is crucial that we choose / such that W is finite. 

On the other hand, when there is a positive probability that a worker is either a perfect worker 
(pj = 1) or a perfect adversary (pj = 0), then W = co or W = — oo, respectively. If we limit 
ourselves to a special case of / in this regime where W = ±00, then there are examples where 
there is a significant gain in switching to an adaptive scheme. In particular, we can show that there 
are examples of the worker quality distribution /, where it is both necessary and sufficient to have 
budget scaling as 1/^ using adaptive schemes, as opposed to (l/q) log(l/e) in the non-adaptive case. 
However, in general minimax scenario, where the nature chooses the worst case / as in Lemma 
|2.6[ there is no gain in using an adaptive scheme, and our approach achieves the order-optimal 
performance with a simple non-adaptive scheme. Further, in practice, it is unlikely that there will 
be perfect workers or perfect adversaries, hence it is expected that W < 00. 

To be concrete, let us consider the spammer-hammer model, where a worker is either a spammer 
with pj = 1/2 or a hammer with pj = 1. A worker quality is randomly chosen such that she is a 
hammer with probability q. Notice that, in this case, we are in the regime where W = 00. We know 



inf sup E[AAdaptiveLB(e, Algo,/)] = ilf-log 



( 



1 



)) 



e 
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that under this spammer-hammer model, if we only use non-adaptive schemes, then it is necessary 
and sufficient to have budget that scales like (1/g) log(l/e) (cf. proof of Lemma 2.4). However, in 
the following, we will show that an adaptive strategy can achieve a target accuracy e with budget 
that scales as achieving a significant gain over non-adaptive schemes. Further, we provide a 
matching lower bound using an oracle estimator. 

Consider an algorithm which partitions the m tasks into sets of size r. At each round, for a 
particular set of r tasks, the algorithm recruits a worker and assigns all r tasks to that new worker. 
The algorithm stops when there are two workers who completely agreed on all r tasks, and then 
moves onto the next set of tasks. This algorithm will surely stop if we have two hammers, in which 
case the expected number of workers we need to recruit is at most 2/q. Further, if we choose r 
to be r2(log(l/e) + log(l/g)), then, when this algorithm stops, we can ensure that the probability 
of making an error is at most e. Hence, with this algorithm, it is sufficient to achieve any target 
accuracy e with budget scaling like 1/q. Notice that in order to prove this sufficient condition, it 
is necessary to have the worker degree r which scales as il(log(l/e) + log{l/q)). There might be 
ways to improve this condition on r by using a more sophisticated algorithm, which might be an 
interesting future research direction. 

Next, consider an oracle estimator which knows who is a spammer and who is a hammer. Then, 
under the iterative scenario, the oracle estimator can assign workers to a task tj until a hammer 
is assigned. This requires recruiting 1/q workers on expectation, and the probability of error is 
zero. Hence, if we let AAdaptiveSH be the budget necessary to achieve a given accuracy e under the 
spammer-hammer model, then this implies that E[AAdaptiveSH] >l/q- 



2.5 Discussion 

Here we discuss several implications of our result and possible future research directions in gener- 
alizing the model studied in this paper. 

First, we provide a performance bound under the below threshold regime when Irq'^ < 1. 
Notice that the bound in ^ is only meaningful when it is less than a half, whence Ifq"^ > 1 and 
Iq > 6 log(2) > 4. While as a task master the case of Ifq'^ < 1 may not be of interest, for the purpose 
of completeness we comment on the performance of our algorithm in this regime. Specifically, we 
empirically observe that the error rate increases as the number of iterations k increases. Therefore, 
it makes sense to use k = 1. In which case, the algorithm essentially boils down to the majority 
rule. We can prove the following error bound. 

Lemma 2.7. For any value of I, r and fi, the estimates we get after first step of our algorithm 
achieve 

^ m 

i=l 

Next, consider the variation where we ask questions to workers whose answers are already known 
in order to assess the quality of the workers (also known as 'gold standard units'). There are two 
ways we can use this information. First, we can place 'seed gold units' along with the standard 
tasks, and use these 'seed gold units' in turn to perform more informed inference. However, we can 
show that there is no gain in using such 'seed gold units'. The optimal lower bound of l/glog(l/e) 
essentially utilizes the existence of oracle that can identify the reliability of every worker exactly, 
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i.e. the oracle has a lot more information than what can be gained by such qualifying questions. 
Therefore, clearly 'seed gold units' do not help the oracle estimator, and hence the order optimality 
of our approach still holds even if we include all the strategies that can utilize these 'seed gold 
units'. 

Second, we can use 'pilot gold units' as qualifying or pilot questions that the workers must 
complete to qualify to participate. Typically a taskmaster do not have to pay for these qualifying 
questions and this provides an effective way to increase the quality of the participating workers. 
Our approach can benefit from such 'pilot gold units', which has the effect of increasing the effective 
collective quality of the crowd q. Further, if we can 'measure' how the distribution of workers change 
when using pilot questions, then our main result fully describes how much we can gain by such 
pilot questions. In any case, pilot questions only change the distribution of participating workers, 
and the ordcr-optimality of our approach still holds even if we compare all the schemes that use 
the same pilot questions. 

Next, we consider workers with different reliabilities and their prices or payments. Specifically, 
suppose there are K classes of worker: workers of class k,l < k < K, have their reliability 
distribution parameter qj. and each of them requires payment of Cfc to perform a task. Now our 
optimality result suggests that the per-task cost scales as c^/q^. log(l/e) if we only used workers 
of class k. More generally, if we use a mix of these workers, say ctfc fraction of workers from class 
k, with Ylk'^k = Ij then the effective parameter q = Ylk'^kQk- And subject to this, the optimal 
per task cost scales as (X^^ Q!fcCfc)/(X^^ Q!fc%) log(l/e). This immediately suggests that the optimal 
choice of fraction ak must be such that > only if Ck/qk = min£Q/(7^. That is, optimal choice 
is to select workers only from the classes that have minimal ratio of Ck/qk over 1 <k < K. 

We now discuss the assumed knowledge of q in selecting the degree I = 6(l/glog(l/e)) in the 
design of the regular bipartite graph that achieve a given target error rate. Here is a simple way 
to overcome this limitation at the loss of only additional constant factor, i.e. scaling of cost per 
task still remains 0(l/glog(l/e)). To that end, consider an incremental design in which at step a 
the system is designed assuming q = 2~" for a > 1. At step a, we design two replicas of the task 
allocation for q = 2~°. Now compare the estimates obtained by these two replicas for all m tasks. 
If they agree amongst m(l — 2e) tasks, then we stop and declare that as the final answer. Or else, 
we increase a to a + 1 and repeat. Note that by our optimality result, it follows that if 2— a is 
less than the actual q then the iteration must stop with high probability. Therefore, the total cost 
paid is 6(l/glog(l/e)) with high probability. Thus, even lack of knowledge of q does not affect the 
optimality of our algorithm. 

Finally, we consider possible generalization of our model. The model assumed in this paper 
does not capture several factors: tasks with different level of difficulties or workers who always 
answer positive or negative. It is desirable to characterize how our algorithm works under such a 
more general model. 

In general, the responses of a worker j to a binary question i may depend on several factors: 
(i) the correct answer to the task; (ii) the difficulty of the task; (iii) the expertise or the reliability 
of the worker; (iv) the bias of the worker towards positive or negative answers. Let Sj G {+1, — 1} 
represent the correct answer and € [0, oo] represents the level of difficulty of task i. Here rj = 
means that the task is so easy that any worker can find the correct answer, and rj = oo means that 
the task is so difficult that even the most diligent worker cannot resolve which is the correct answer. 
Let aj G [— oo, oo] represent the reliability and /3j G (— oo, oo) represent the bias of worker j. Here, 
aj = oo means that the worker answers all tasks correctly, aj = — oo means that the worker answers 
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all tasks incorrectly, and aj = means that the worker gives random answers independent of what 
the correct answer is. Further, /3j = oo or — oo means that the worker always answers positive or 
negative respectively, and f3j = means that the worker is unbiased and makes error equally likely 
whether the correct answer is positive or negative. In formula, a worker j's response to a binary 
task i can be modeled as 



where Zjj- is a Gaussian random variable distributed as Zjj- ~ J\f{ajSi + f3j,ri), and it is understood 
that sign(Z) is an unbiased binary random variable for Z ~ oo), and sign(Z) = 1 almost surely 
for Z ~ AA(oo, 1). Except for the fact that we consider a random variable Zjj- instead of a random 
vector, this is equivalent to the model studied in |WBBPl Oj. 

Most of the models studied in the crowdsourcing literature can be reduced to a special case of 
this model. For example, the early crowdsourcing model introduced by Dawid and Skene |DS79 
is equivalent to the above Gaussian model with n = 1. More recently, Whitehill et al. [WRW"'"09 
introduced another model where F{Aij = Si\ai,bj) = 1/(1 + e"'^*''^), with worker reliability and 
task difficulty bj. This is again a special case of the above Gaussian model if we set f3j = 0. The 
model we study in this paper has an underlying assumption that all the tasks share an equal level 
of difficulty and the workers are unbiased. It is equivalent to the above Gaussian model with /3j = 
and Tj = 1. In this case, there is a one-to-one relation between the worker reliability pj and aj: 
Pj = Q(aj), where Q{-) is the tail probability of the standard Gaussian distribution. 

The role of the bias of the workers is also important in correctly identifying the quality of the 
workers, in order to selectively pay the workers based on their performance. Ipeirotis, Provost, 
and Wang studied how to separate the true error rate from the biases that some workers exhibit in 
order to obtain better evaluation of the workers' quality in |IPW10j . 

2.6 Relations to low-rank matrix approximation 

The leading singular vectors are often used to capture the important aspects of datasets in matrix 
form. In our case, the leading left singular vector of A can be used to estimate the correct answers, 
where A S {0, ii}™^^"- ig the m x n adjacency matrix of the graph G weighted by the submitted 
answers. One way to compute the leading singular vector is through power iteration: for two vectors 
u S M™ and v G M", starting with a randomly initialized v, power iteration iteratively updates u 
and V according to 



It is known that normalized u (and v) converges linearly to the leading left (and right) singular 
vector. This update rule is very similar to that of our iterative algorithm. But there is one 
difference that is crucial in the analysis: in our algorithm we follow the framework of the celebrated 
belief propagation algorithm [Pea88. IYFW03] and exclude the incoming message from node j when 
computing an outgoing message to j. This extrinsic nature of our algorithm and the locally tree- 
like structure of sparse random graphs |RU081 IMM09| allow us to perform sharp analysis on the 
average error rate. In particular, if we use the leading singular vector of A to estimate s, such 
that Si = sign(Mj), then existing analysis techniques from random matrix theory does not give the 
strong performance guarantee we have (cf. [KUSlll Theorem II. 1]). These techniques typically 




for all i, Ui 



y AijVj, and for all j, Vj 
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focus on understanding how the subspace spanned by the top singular vector behaves. To get the 
sharp bound in Theorem 2.1 we need to analyze how each entry of the leading singular vector is 
distributed. We introduce the iterative algorithm in order to precisely characterize how each of 
the decision variable Xi is distributed. Since the iterative algorithm introduced in this paper is 
quite similar to power iteration used to compute the leading singular vectors, this suggests that 
our analysis may shed light on how to analyze the top singular vectors of a sparse random matrix. 



3 Proof of main results 

In this section, we provide proofs of the main results. 
3.1 Proof of Theorem O 

By symmetry, we can assume all Sj's are +1. If I is a random integer drawn uniformly in [m], then 

— F(si / Si) = F(si^si) = F(x[''^ < 0) + -F(x['''^ = 0) < Fixi'"'^ < 0) , 
m ^-^ 2 

(k) 

where x^ denotes the decision variable for task i after k iterations of the iterative algorithm. 
Asymptotically, for a fixed k, I and r, the local neighborhood of xj converges to a regular tree. To 
analyze limm^-oo lP(a;/'^^ < 0), we use a standard probabilistic analysis technique known as 'density 
evolution' in coding theory or 'recursive distributional equations' in probabilistic combinatorics 
|RU08| [MM09j . Precisely, we use the following equality that in the large system limit, 

lim F{xf^ < 0) = P(i('') < 0) , (7) 

where x*^'^-' is defined through density evolution equations ([s]) and Q in the following. 
Density evolution. In the large system limit as m — >• oo, the (/, r)-regular random graph locally 
converges in distribution to a (Z, r)-regular tree. Therefore, for a randomly chosen edge the 
messages and yj^i converge in distribution to x and y-p^ defined in the following density 

evolution equations ([s]). Here and after, we drop the superscript k denoting the iteration number 
whenever it is clear from the context. We initialize yp with a Gaussian distribution independent of 

P- Yp^'^ ~ -^(li !)• Let = denote equality in distribution. Then, for k {1,2, . . .}, 

where Xj's, pj's, and yp,i's are independent copies of x, p, and y^, respectively. Also, Zp/s and 
Zpj's are independent copies of Zp. p G [0, 1] is a random variable distributed according to the dis- 
tribution of the worker's quality. Zpj's and Xj's are independent. Zp.^j's and Yp^/s are conditionally 
independent conditioned on pj. Finally, Zp is a random variable distributed as 

J +1 with probability p , 
^ \ —1 with probability 1 — p . 
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(k) 

Then, for a randomly chosen /, the decision variable Xj converges in distribution to 

= Y.-r>..yif ■ (9) 

te[i] 

Numerically or analytically computing the densities in ([s]) exactly is not computationally feasible 
when the messages take continuous values as is the case for our algorithm. Typically, heuristics are 
used to approximate the densities such as quantizing the messages, approximating the density with 
simple functions, or using Monte Carlo method to sample from the density. Novel contributions of 
our analysis is that we prove that the messages are sub-Gaussian using recursion, and we provide 
an upper bound on the parameters in a closed form. This allows us to prove the sharp result on 
the error bound that decays exponentially. 

Mean and variance computation. To give an intuition on how the messages behave, we 
describe the evolution of the mean and the variance of the random variables in Q . Define m^^'^ = 
E[x('^)], rhp^ = E[yp'^''|p], v^^^ = Var(x('^)), and Vp^ = Var(yp'^'*|p). Let p be a random variable 
distributed according to the measure /i. Then, from (Isl) we get that 



m 



/Ep[(2p- l)m 
= r{2p- 



p 



PL'^P ^ V"''P / J ^ [(2P ~ l)^p 
|^W + (^(fc))2_ ((2p_l)^W)2| . 



Recall that /i = E[2p — 1] and q = E[(2p — 1)^]. Substituting rhp and we get the following 
evolution of the first and the second moment of the random variable x^^^ . 

^(fc+i) ^ irqm^^^ , 

^(fc+i) ^ />t;W + ff(m('=))2(l-g)(l + fg) . 

Since ifip^ = 1 and ii'^^^ = 1 as per our assumption, we have m*-^) = /iZ and u^^-* = [(4 — /x^). This 
implies that m^*^) = fjLl{lfq)^^^ ^ and v'^^'^ = av^^^^^ + bc^~'^, with a = lf,b = fj?l^f{l — q){l + rq), 
and c = (Ifq)'^. After some algebra, it folows that v^'^'^ = w'^-'^^a^"^ + bc^^'^ Y1\Zq{o-/ ■ 
For Ifq^ > 1, we have a/c < 1 and 

= [(4 - ii^){iff-^ + {l-q){l + fg)//^P(ffg)^^-^ ^~y^^^^^'^''' . 

Ifq^ — 1 

The first and second moment of the decision variable x^*^) in ^ can be computed using a similar 
analysis: E[x('')] = {l/l)m^^^ and Var(x('^)) = {l/l)v^^\ In particular, we have 

Var(i('=)) _ [(4 - fi^) ^ /(I -q){l+ rq) 1 



E[iW]2 llii\lrq^)^-^ l{lfq^-\) V (ff^ 
Applying Chebyshev's inequality, it immediately follows that P(x'''^) < 0) is bounded by the right 



hand side of the above equality. This bound is weaker compared to the bound in Theorem 2.1 
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In the following, we prove the stronger result using sub-Gaussianity of x^'^^'s. But first, let us 
analyze what this weaker bound gives for different regimes of /, r, and q, which indicates that the 
messages exhibit a fundamentally different behavior in the regimes separated by a phase transition 
at Ifq^ = 1. 

In a 'good' regime where we have Ifq^ > 1, the bound converges to a finite limit as the number 
of iterations k grows. Namely, 

hm P(x(^) < 0) < ^^i''- <l)^^+'^) . 
k-^co l{lrq'^ - 1) 

In the case when Irq^ < 1, the same analysis gives 

Finally, when Ifq'^ = 1, we get v^''^ = (Ir)'' + lf{l — q){l + fq){lfq)'^''~'^k, which implies 

Var(x(*^)] 



Q{k). 



Analyzing the density. Our strategy to provide an upper bound on P(x(*''^ < 0) is to show that 
x^'^') is sub-Gaussian with appropriate parameters and use the Chernoff bound. A random variable 
z with mean m is said to be sub-Gaussian with parameter a if for all A € M the following inequality 
holds: 

Define 

1 — [l/q^lr) 

and ruk = fj.l{qlf)''~^ for /c G Z. We will first show that, x*^'^) is sub-Gaussian with mean irik 
and parameter (t| for a regime of A we are interested in. Precisely, we will show that for |A| < 

l/(2mfc_if). 

By definition, due to distributional independence, we have E[e^*'"] = E[e^^*'Y'^'^- Therefore, it 
follows from ^ that i(^) satisfies E[e-^**''^] < e(V0™fcA+(//20<72A2_ Applying the Chernoff bound 
with A = —mk/{cr1), we get 

P(x('=)<0) < E[e^**''] < e-'™^/(2/Vi) ^ (^^^^ 

Since mkmk-i/{al) < iiH'^{qiff^-^/{2,ix^qPr'^{qirf''-'^) = l/(3f), it is easy to check that |A| < 
l/(2mfc_if). Substituting (11) in ([T]), this finishes the proof of Theorem 2.1 
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Now we are left to prove that x^^^ is sub-Gaussian with appropriate parameters. We can write 
down a recursive formula for the evolution of the moment generating functions of x and yp as 



E[( 



pE[e^yp' ''|p] + PIEK^''P^ "|p 



(fc)-, 



pE[e^^*'^] +pE[e 



(12) 
(13) 



where p = 1 — p and p = 1 — p. We can prove that these are sub-Gaussian using induction. 

First, for A; = 1, we show that x^^^ is sub-Gaussian with mean rrii = fil and parameter af = 21, 
where = E[2p — 1]. Since yp is initiahzed as Gaussian with unit mean and variance, we have 



jEjg-^yp = g-^+(i/2)A"' regardless of p. Substituting this into (12), we get for any A, 



E 



E[p]eV(l-E[p])e-^) e(i/2)'A^ < 



(14) 



where the inequality follows from the fact that ae^ + (1 — a)e ^ < e^^" i)z+{i/2)z^ g^^y. ^ ^ 
and a G [0, 1] (cf. [MMl Lemma A.1.5]). 



Next, assuming E[e^'' ] < e'"'=^+(V2)'TfeA < l/(2mfc_if), we show that E[e 



< 



gmfc+iA+{i/2)(Tj._^^A £q^, ^ l/(2mfcf), and compute appropriate rrifc+i and Substituting the 

bound E[e^^'''] < e'^fc^+^i/^)'^!^' in we get E[e^yp^] < {pe"^"^ + pe-^^k^y e{i/2)fal\\ further 
applying this bound in (12), we get 

- w(fc+l) 



E 



< E 



p(pe " + pe ) + p(pe " + pe " ) 



(15) 



To bound the first term in the right-hand side, we use the next key lemma. 
Lemma 3.1. For any \z\ < l/(2f) and p G [0, 1] such that q = E[(2p — 1)^], we have 



p(pe'^ + pe ^)'^ + p(pe'^ + pe" 



< ggr2+(l/2)(3gf2+r)^2 



Applying this inequality to (15) gives 

jgjgAx('=+i)j ^ g<7«>mfeA+(l/2)((3g/>2+«>)m2+i>a2)A2 ^ 

for |A| < l/(2mfcf). In the regime where ql-r > 1 as per our assumption, ruk is non-decreasing in k. 
At iteration k, the above recursion holds for |A| < min{l/(2mif), . . . , l/(2mfc_if)} = l/(2mfc_if). 
Hence, we get the following recursion for and ak such that (10) holds for |A| < l/(2mfe_ir). 

ruk+i = qifnik , 
al+i = {3qif^ + if)ml + ifal . 

With the initialization mi = jil and a\ = 2Z, we have vrik = ^l{qlf)^~^ for k G {1,2,...} and 
cr| = acr|_^ + bc'^~'^ for A; G {2, 3, . . .}, with a = lr,b = p'^P{3qlr'^ + If), and c = (qlf)'^. After some 

algebra, it follows that al = afa^-^ + hc^-'^Y!iZl{a/cY. For Ifq^ / 1, we have a/c ^ 1, whence 
al = ala^'^ + bc^-^{l - (a/c)'=-i)/(l - a/c). This finishes the proof of 
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3.2 Proof of Lemma |3TT] 

By the fact that ae'' + {l-a)e-'' < e(2a-i)''+{i/2)b2 f^^. any 6 E M and a £ [0, 1], we have pe'^+pe"^ < 
g(2p-i)z+(i/2)2 almost surely. Applying this inequality once again, we get 



E 



< E 



p(pe^ + pe + p(pe'^ + pe' 
Using the fact that < 1 + a + O.GSa^ for \a\ < 5/8, 

,(2p-l)2f2+(l/2){2p-l)2f2z2" 



^(2p-l)2r2+(l/2)(2p-l)2f222 



,(l/2)f22 



E 



< E 



2^2,2\2 



1 + (2p - lyfz + (l/2)(2p - lyf-^z^ + 0.63((2p - lyrz + (l/2)(2p - lyf-^z 



2„2 



< 1 + qrz + {3/2)qr'^z 



for \z\ < l/(2r). This proves Lemma 3.1 
3.3 Proof of Lemma Q 

Each iteration of the iterative algorithm requires 0{ml) operations. To compute Xi^j, we can first 
compute Yljedi ^ijUj^i then each outgoing messages can be computed in constant time by 
subtracting the appropriate outgoing messages: Xi^j = Yljedi ^ijUj^i ~ ^ijUj^i each j £ di. 

We need 0(log(g/^^)/log(g^/r)) iterations to ensure an error bound which scales as Q. Using 
the definition of cr^ in ([T]) , notice that the relative error is bounded by 



< 



{Irq 



For Irq'^ > 1 as per our assumption, k = 0{[og{q/ n^) / \og{lrq^)) iterations suffice to ensure that 
the right-hand side is bounded by a constant. 



3.4 Proof of Lemma 12.41 

In this minimax scenario, we consider a distribution of the worker reliability from the spammer- 
hammer model, and prove a lower bound on the error achieved by an oracle estimator that knows 
the reliability of every worker. 

Under the spammer-hammer model there are only two kinds of workers, a hammer with pj = 1 
always gives the right answer, and a spammer with pj = 1/2 always gives a random answer. Each 
worker is a spammer with probability q and hammer with probability 1—q such that E[(2p— 1)^] = q. 

Let us consider an oracle estimator that makes the optimal decision based on information 
provided by an oracle. The oracle gives information on which worker is a spammer and which one 
is a hammer. Then, this oracle estimator only makes mistakes on tasks whose neighbors are all 
spammers, in which case the estimator ffips a fair coin to make a decision. If we denote the degree 
of a task node i by li, the error rate P(sj ^ Si) is (1/2)(1 — q^K Note that no algorithm with access 
to only {Aij} can produce more accurate estimate, when nature chooses the worst case {sj}, than 
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the oracle estimator. Therefore, for any estimate s that is a function of {74jj}(j j-jg^;, we have the 
fohowing minimax bound. 

inf sup dm{s,SG,ALGo) > — y]^(l-9)^S 

for any graph G that have m task nodes and degree k for node U. By convexity, the right-hand 
side is always larger than (1/2)(1 — where \E\ = X^j/j. This proves the desired claim. Let 

us emphasize that this minimax lower bound is completely general and holds for any graph, regular 
or irregular. 



3.5 Proof of Lemma 12.51 

Now, consider a naive majority voting algorithm. Majority voting simply follows what the majority 
of workers agree on. In formula, Si = sign(^^g^- Aij), where di denotes the neighborhood of node 
i in the graph. It makes a random choice when there is a tie. When we have many spammers in 
the crowd, majority voting is prone to make mistakes since it gives the same weight to both the 
estimates provided by spammers and that of diligent workers. 

We want to compute a lower bound on F{si ^ Si). Let Xi = Y2jedi^ij where di denotes 
the neighborhood of task node tj. Assuming Sj = +1 without loss of generality, the error rate is 
lower bounded by P(xj < 0). After rescaling, {l/2)(xi + is a standard binomial random variable 
Binom(/, a), where / is the number of neighbors of the node i, a = IE[pj], and by assumption each 
Aij is one with probability a. 

It follows that ¥{xi = -I + 2k) = - k)\k\)a^{l - a)'-'^ . Further, for k < al - I, the 

probability distribution function is monotonically increasing. Precisely, 

^{^Xi = -I + 2{k + I)) ^ a(l-k) ^a{l-al + l) ^ 
P(x, = -l + 2k) - (l-a)(A: + l) " {I - a)al ^ ' 

where we used the fact that the above ratio is decreasing in k whence the minimum is achieved at 
k = al — 1 under our assumption. 

First, we give an outline of the proof strategy, using simple Gaussian example. In the limit 
as I grows large, Xi converges in distribution to a Gaussian random variable with mean (2a — 1)/ 
and variance 4/a(l — a). Note that the Gaussian pdf g{z) = (l/y^27rVar(xj)) exp{— (l/2)(2; — 
IE[xj])^/Var(2;j)} is monotonically increasing for z < Then, F{xi < 0) can be lower bounded 

by 

F{xi <0) < Vl gi-Vl) 

= ^ cxpf + 

V87ra(l - a) I 24/q(1 - q) / 

= exp I - Cil{2a - 1)^ + 0(l + y^l{2a - 1)^ ) } , 

for < E[pj] < 1 and some constant Ci. This shows that under the Gaussian assumption, we need 
/ ~ log(l/e)/(2a - 1)2 to be able to ensure that the probability of error is less than e G (0, 1/2). 

Using a similar strategy, we can prove a concrete bound on ¥(xi < 0) for binomial Xj. Let 
us assume that I is even, so that Xj take even values. When / is odd, the same analysis works. 
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but Xi takes odd values. Our strategy is to use a simple bound: F{xi < 0) > kF(xi = —2k). 
By assumption that a = IE[pj] > 1/2, For an appropriate choice of fe = the right-hand side 
closely approximates the error probability as we saw in the Gaussian example. By definition of Xi, 
it follows that 

Applying Stirling's approximation, we can show that 

for some numerical constant C2. We are interested in the case where worker quality is low, that 
is a is close to 1/2. Accordingly, for the second and third terms in (16), we expand in terms of 
2a - 1. 

logfaV-^fl-a^'/^^^ 



_ V/) (log(l + (2a - 1)) - log(2)) + + V/) (log(l - {2a - 1)) - log(2) 
-I log(2) - + 0(V/(2a-l)4) . (18) 



We can substitute (17) and ( |18| ) in (16) to get the following bound: 

IP(xi<0) > exp{-C3(/(2a-l)2 + l)} , (19) 

for some numerical constant C3. 

Now, let li denote the degree of task node i. Then for any {sj} G {±1}"*, any distribution of p 
such that /i = E[2p — 1] = 2a — 1, and any graph G with m task nodes, the following lower bound 
is true. 



m 

m ■' 



C?m(s, SG, Majority) ^ — 

™ i=l 

By convexity, this implies 

Under the spammer-hammer model, where [1 = q this gives 

inf sup (im(s,SG,Majority) > e"'^='(''^^+^^ 

This finishes the proof of lemma. 
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3.6 Proof of Lemma 12.61 



To prove the minimax bound, let nature choose the following distribution of the workers which we 
denote (a, g)-model. Under this model, we assume that q < a? and 

_ J 1/2 with probability I - {q/a?) , 

~ \ 1/2(1+ a) with probability g'/a^ , 

such that E[(2pj — 1)^] = q. We assume that each task is chosen i.i.d. and is a positive task with 
probability hi: 

+1 with probability hi , 
— 1 with probability 1 — hi , 

and assume hi > 1/2 without loss of generality. We consider an oracle estimator that knows the 
reliability of every worker (all the Pj's), the values of 6j's, the value of a and q, and can choose 
any adaptive task allocation scheme. After we run this oracle estimator, let Lj be the random 
variable representing the total number of workers assigned to task tj. The oracle estimator knows 
all the values necessary to compute the error probability of each task. Let Ej be the random 
variable representing the error probability as computed by the oracle estimator, conditioned on the 
Lj responses we get from the workers and their reliability Pj's. We are interested in how large the 
average budget (1/m) X^j^i-'-'i] be in order to achieve (1/m) X^jIE[Ej] < e. In the following 

we will show that for any task U, independent of which task allocation scheme is used, it is necessary 
that 

E[L,] > ^logf-i-) , (20) 



q ° VE[E. 

for some constant C that only depends on hi. By convexity of log(l/a;) and Jensen's inequality, 
this implies that 

1 C / 1 

-^E[L,] > -log 

Hence, in order to achieve (1/m) ^P(sj ^ Si) < e, it is necessary for this oracle estimator to spend 
budget which scales as (1/^) log(l/e) for each task. This finishes the proof of the lemma. 



Now, we are left to prove (20). Focusing on a single task tj, since spammers with pj = 1/2 give 
us no information about the task, we only need to consider the responses from the reliable workers. 
Let us consider a case where after getting responses from li reliable workers {with pj = (l/2)(l+a)), 
the oracle estimator computes Si according to an optimal majority rule based on the value of hi and 
a, and computes the probability of making an error. Although this probability of error depends on 
what the responses are, there is a simple way to compute a lower bound on the probability of error 
regardless of the actual responses we get. 

In the best case, all reliable workers would have agreed on the task tj. Further, since we assumed 
that hi > 1/2, the probability of making an error is smallest if all the reliable workers agreed that 
the task is a positive task, where Si = +1. Then, P(sj = +1 and li reliable workers responded +) = 
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bi{{l + a)/2)'% and F{si = —1 and k reliable workers responded +) = (! — 6j)((l — a)/2)'». 

P (error | li responses from reliable workers) 

> P(sj = — 1 1 /j reliable workers responded +) 
(l-6.)((l/2)(l-a)y- 
(l-M((l/2)(l-a)y»+5.((l/2)(l + G)y^ 

1 



1 + A-((l + «)/(!- 

For a given li, let Ej = E[I(sj ^ Si) \ li responses from reliable workers] be the random variable 
representing the conditional error rate. The above inequality implies that 

E,; > ^ 



1 + + «)/(!- a)y» ' 

almost surely. Let Lj be the number of workers we need to recruit to see li reliable workers. Since 
on average, we need to recruit a^/q workers to see one reliable worker, we have that, almost surely, 

nu] = -k 

> ^log' 



9log((l + a)/(l-a)) ^ 



'?log((l + a)/(l-a)) ^ 2E,6. 



where, in the last inequality, we used the fact that whenever the oracle estimator makes an estimate, 
the error rate is at most 1/2, since a random guess guarantees error probability of 1/2. Since this 
inequality holds almost surely and for all values of Zj, it also holds in expectation such that 

E[L.l > '-^^ .log ^ 

«log((l + o)/(l-<i)) 

Maximizing over all choices of a G (0, 1), we get 

0.29, / (1 - hi) 
E[Li] > log ' ^ " 



2E[E,]6, 



q "V2E[Ei]6i 

which in particular is true with a = 0.8. For this choice of a, the result holds in the regime where 
q < 0.64 such that the probability of seeing a reliable worker under (a, (7)-model is at most one. 
For any bi bounded away from one, as per our assumption, this implies ( pO| ). 

3.7 Proof of Lemma 12.71 

The probability of making an error after 1 iteration of our algorithm for node i is P(si ^ Sj) = 
0), where x*^^) is defined in ^ and assuming without loss of generality that Si = 1. From 
( (lil ) it follows that 



E 



e 



where we choose A = — /x/2. By Chernoff's inequality, this implies the lemma. 
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